16 research outputs found
MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP Initialization
Training graph neural networks (GNNs) on large graphs is complex and
extremely time consuming. This is attributed to overheads caused by sparse
matrix multiplication, which are sidestepped when training multi-layer
perceptrons (MLPs) with only node features. MLPs, by ignoring graph context,
are simple and faster for graph data, however they usually sacrifice prediction
accuracy, limiting their applications for graph data. We observe that for most
message passing-based GNNs, we can trivially derive an analog MLP (we call this
a PeerMLP) with an equivalent weight space, by setting the trainable parameters
with the same shapes, making us curious about \textbf{\emph{how do GNNs using
weights from a fully trained PeerMLP perform?}} Surprisingly, we find that GNNs
initialized with such weights significantly outperform their PeerMLPs,
motivating us to use PeerMLP training as a precursor, initialization step to
GNN training. To this end, we propose an embarrassingly simple, yet hugely
effective initialization method for GNN training acceleration, called MLPInit.
Our extensive experiments on multiple large-scale graph datasets with diverse
GNN architectures validate that MLPInit can accelerate the training of GNNs (up
to 33X speedup on OGB-Products) and often improve prediction performance (e.g.,
up to improvement for GraphSAGE across datasets for node
classification, and up to improvement across datasets for link
prediction on metric Hits@10). The code is available at
\href{https://github.com/snap-research/MLPInit-for-GNNs}.Comment: Accepted by ICLR202
Flashlight: Scalable Link Prediction with Effective Decoders
Link prediction (LP) has been recognized as an important task in graph
learning with its broad practical applications. A typical application of LP is
to retrieve the top scoring neighbors for a given source node, such as the
friend recommendation. These services desire the high inference scalability to
find the top scoring neighbors from many candidate nodes at low latencies.
There are two popular decoders that the recent LP models mainly use to compute
the edge scores from node embeddings: the HadamardMLP and Dot Product decoders.
After theoretical and empirical analysis, we find that the HadamardMLP decoders
are generally more effective for LP. However, HadamardMLP lacks the scalability
for retrieving top scoring neighbors on large graphs, since to the best of our
knowledge, there does not exist an algorithm to retrieve the top scoring
neighbors for HadamardMLP decoders in sublinear complexity. To make HadamardMLP
scalable, we propose the Flashlight algorithm to accelerate the top scoring
neighbor retrievals for HadamardMLP: a sublinear algorithm that progressively
applies approximate maximum inner product search (MIPS) techniques with
adaptively adjusted query embeddings. Empirical results show that Flashlight
improves the inference speed of LP by more than 100 times on the large
OGBL-CITATION2 dataset without sacrificing effectiveness. Our work paves the
way for large-scale LP applications with the effective HadamardMLP decoders by
greatly accelerating their inference
Empowering Graph Representation Learning with Test-Time Graph Transformation
As powerful tools for representation learning on graphs, graph neural
networks (GNNs) have facilitated various applications from drug discovery to
recommender systems. Nevertheless, the effectiveness of GNNs is immensely
challenged by issues related to data quality, such as distribution shift,
abnormal features and adversarial attacks. Recent efforts have been made on
tackling these issues from a modeling perspective which requires additional
cost of changing model architectures or re-training model parameters. In this
work, we provide a data-centric view to tackle these issues and propose a graph
transformation framework named GTrans which adapts and refines graph data at
test time to achieve better performance. We provide theoretical analysis on the
design of the framework and discuss why adapting graph data works better than
adapting the model. Extensive experiments have demonstrated the effectiveness
of GTrans on three distinct scenarios for eight benchmark datasets where
suboptimal data is presented. Remarkably, GTrans performs the best in most
cases with improvements up to 2.8%, 8.2% and 3.8% over the best baselines on
three experimental settings
Data Augmentation for Graph Neural Networks
Data augmentation has been widely used to improve generalizability of machine
learning models. However, comparatively little work studies data augmentation
for graphs. This is largely due to the complex, non-Euclidean structure of
graphs, which limits possible manipulation operations. Augmentation operations
commonly used in vision and language have no analogs for graphs. Our work
studies graph data augmentation for graph neural networks (GNNs) in the context
of improving semi-supervised node-classification. We discuss practical and
theoretical motivations, considerations and strategies for graph data
augmentation. Our work shows that neural edge predictors can effectively encode
class-homophilic structure to promote intra-class edges and demote inter-class
edges in given graph structure, and our main contribution introduces the GAug
graph data augmentation framework, which leverages these insights to improve
performance in GNN-based node classification via edge prediction. Extensive
experiments on multiple benchmarks show that augmentation via GAug improves
performance across GNN architectures and datasets.Comment: AAAI 2021. This complete version contains the Appendi
Link Prediction with Non-Contrastive Learning
A recent focal area in the space of graph neural networks (GNNs) is graph
self-supervised learning (SSL), which aims to derive useful node
representations without labeled data. Notably, many state-of-the-art graph SSL
methods are contrastive methods, which use a combination of positive and
negative samples to learn node representations. Owing to challenges in negative
sampling (slowness and model sensitivity), recent literature introduced
non-contrastive methods, which instead only use positive samples. Though such
methods have shown promising performance in node-level tasks, their suitability
for link prediction tasks, which are concerned with predicting link existence
between pairs of nodes (and have broad applicability to recommendation systems
contexts) is yet unexplored. In this work, we extensively evaluate the
performance of existing non-contrastive methods for link prediction in both
transductive and inductive settings. While most existing non-contrastive
methods perform poorly overall, we find that, surprisingly, BGRL generally
performs well in transductive settings. However, it performs poorly in the more
realistic inductive settings where the model has to generalize to links to/from
unseen nodes. We find that non-contrastive models tend to overfit to the
training graph and use this analysis to propose T-BGRL, a novel non-contrastive
framework that incorporates cheap corruptions to improve the generalization
ability of the model. This simple modification strongly improves inductive
performance in 5/6 of our datasets, with up to a 120% improvement in
Hits@50--all with comparable speed to other non-contrastive baselines and up to
14x faster than the best-performing contrastive baseline. Our work imparts
interesting findings about non-contrastive learning for link prediction and
paves the way for future researchers to further expand upon this area.Comment: ICLR 2023. 19 pages, 6 figure
Knowing your FATE: Friendship, Action and Temporal Explanations for User Engagement Prediction on Social Apps
With the rapid growth and prevalence of social network applications (Apps) in
recent years, understanding user engagement has become increasingly important,
to provide useful insights for future App design and development. While several
promising neural modeling approaches were recently pioneered for accurate user
engagement prediction, their black-box designs are unfortunately limited in
model explainability. In this paper, we study a novel problem of explainable
user engagement prediction for social network Apps. First, we propose a
flexible definition of user engagement for various business scenarios, based on
future metric expectations. Next, we design an end-to-end neural framework,
FATE, which incorporates three key factors that we identify to influence user
engagement, namely friendships, user actions, and temporal dynamics to achieve
explainable engagement predictions. FATE is based on a tensor-based graph
neural network (GNN), LSTM and a mixture attention mechanism, which allows for
(a) predictive explanations based on learned weights across different feature
categories, (b) reduced network complexity, and (c) improved performance in
both prediction accuracy and training/inference time. We conduct extensive
experiments on two large-scale datasets from Snapchat, where FATE outperforms
state-of-the-art approaches by error and
runtime reduction. We also evaluate explanations from FATE, showing strong
quantitative and qualitative performance.Comment: Accepted to KDD 2020 Applied Data Science Trac
Graph Condensation for Graph Neural Networks
Given the prevalence of large-scale graphs in real-world applications, the
storage and time for training neural models have raised increasing concerns. To
alleviate the concerns, we propose and study the problem of graph condensation
for graph neural networks (GNNs). Specifically, we aim to condense the large,
original graph into a small, synthetic and highly-informative graph, such that
GNNs trained on the small graph and large graph have comparable performance. We
approach the condensation problem by imitating the GNN training trajectory on
the original graph through the optimization of a gradient matching loss and
design a strategy to condense node futures and structural information
simultaneously. Extensive experiments have demonstrated the effectiveness of
the proposed framework in condensing different graph datasets into informative
smaller graphs. In particular, we are able to approximate the original test
accuracy by 95.3% on Reddit, 99.8% on Flickr and 99.0% on Citeseer, while
reducing their graph size by more than 99.9%, and the condensed graphs can be
used to train various GNN architectures.Comment: 16 pages, 4 figure
Linkless Link Prediction via Relational Distillation
Graph Neural Networks (GNNs) have shown exceptional performance in the task
of link prediction. Despite their effectiveness, the high latency brought by
non-trivial neighborhood data dependency limits GNNs in practical deployments.
Conversely, the known efficient MLPs are much less effective than GNNs due to
the lack of relational knowledge. In this work, to combine the advantages of
GNNs and MLPs, we start with exploring direct knowledge distillation (KD)
methods for link prediction, i.e., predicted logit-based matching and node
representation-based matching. Upon observing direct KD analogs do not perform
well for link prediction, we propose a relational KD framework, Linkless Link
Prediction (LLP), to distill knowledge for link prediction with MLPs. Unlike
simple KD methods that match independent link logits or node representations,
LLP distills relational knowledge that is centered around each (anchor) node to
the student MLP. Specifically, we propose rank-based matching and
distribution-based matching strategies that complement each other. Extensive
experiments demonstrate that LLP boosts the link prediction performance of MLPs
with significant margins, and even outperforms the teacher GNNs on 7 out of 8
benchmarks. LLP also achieves a 70.68x speedup in link prediction inference
compared to GNNs on the large-scale OGB dataset